Adaptive Data Partitioning Using Probability Distribution

نویسندگان

  • Xipeng Shen
  • Yutao Zhong
  • Chen Ding
چکیده

Many computing problems benefit from dynamic data partitioning—dividing a large amount of data into smaller chunks with better locality. When data can be sorted, two methods are commonly used in partitioning. The first selects pivots, which enable balanced partitioning but cause a large overhead of up to half of the sorting time. The second method uses simple functions, which is fast but requires that the input data confirm to a uniform distribution. In this paper, we propose a new method, which partitions data using the cumulative distribution function. It partitions data of any distribution in linear time, independent to the number of sublists to be partitioned into. Experiments show 10-30% improvement in partitioning balance and 20-70% reduction in partitioning overhead. The new method is more scalable than existing methods. It yields greater benefit when the data set and the number of sub-lists grow larger. By applying this method, our sequential sorting beats Quick-sorting by 20% and parallel sorting exceeds the previous sorting algorithm by 33-50%.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Predicting Implantation Outcome of In Vitro Fertilization and Intracytoplasmic Sperm Injection Using Data Mining Techniques

Objective The main purpose of this article is to choose the best predictive model for IVF/ICSI classification and to calculate the probability of IVF/ICSI success for each couple using Artificial intelligence. Also, we aimed to find the most effective factors for prediction of ART success in infertile couples. MaterialsAndMethods In this cross-sectional study, the data of 486 patients are colle...

متن کامل

Hydrograph Estimation based on Various Components of Rainfall Using Adaptive Neuro-Fuzzy Inference System in Kasilian Watershed

Flood hydrograph preparation and estimation are considered a comprehensive information for soil and water managers and planners. While it is not simply possible preparing it for all watersheds. Therfore suitable flood hydrograph estimation and modeling seems to be necessary using available rainfall data. The study area is located in Kasilian representative watershed in Mazandaran province compr...

متن کامل

تخمین وفقی مرز کلاتر در کلاتر‌های ویبول با استفاده از پیش آشکارساز UMPI

In radar detection, the existence of the clutter edge in the reference samples considerably degrades the performance of the detector. Hence, clutter edge estimation not only improves the CFAR detectors, but also can be used for partitioning the various areas of the clutter in the clutter map. In this paper, we propose an adaptive algorithm for detecting the clutter edge between two Weibull clut...

متن کامل

ARMaDA: An Adaptive Application-sensitive Partitioning Framework for SAMR Applications

Distributed implementations of dynamic adaptive mesh refinement techniques offer the potential for accurate solutions of physically realistic models of complex physical phenomena. However, configuring and managing the execution of these applications presents significant challenges in resource allocation, data-distribution and loadbalancing, communication and coordination, and runtime management...

متن کامل

An Adaptive Approach to Increase Accuracy of Forward Algorithm for Solving Evaluation Problems on Unstable Statistical Data Set

Nowadays, Hidden Markov models are extensively utilized for modeling stochastic processes. These models help researchers establish and implement the desired theoretical foundations using Markov algorithms such as Forward one. however, Using Stability hypothesis and the mean statistic for determining the values of Markov functions on unstable statistical data set has led to a significant reducti...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003